On Regex
__TOC__ References * Regular Expressions in Single UNIX Specification * Regular Expressions in POSIX.1-2008 * Tutorial on Regular-Expressions.info * Regex Reference * Regexp Syntax Summary Syntax Character Classes Character class expressions * POSIX Bracket Expressions ** POSIX bracket expressions are a special kind of character classe. ** :alnum: = a-zA-Z0-9, :alpha: = a-zA-Z, :digit: = 0-9, :lower: = a-z ~] || || \p{Punct} || |- | :space: || All whitespace characters || [ \t\r\n\v\f] || \s || \p{Space} || |- | :word: || Word characters || A-Za-z0-9_ || \w || || |} Shorthand Character Classes * Shorthand Character Classes Quantifier * Possessive Quantifiers Anchor * Anchors * BRE Expression Anchoring * ERE Expression Anchoring Case Conversion * Context and Case Conversion Readings * Regular Expression Flavor Comparison * Know your regular expressions from developerWorks * Substitutions in Regular Expressions (MSDN) BRE vs ERE * POSIX ** Basic Regular Expressions ** Extended Regular Expressions Regex Dialects * Comparison of regular expression engines Java * API documentation of java.util.regex.Pattern * Java Regex Tutorial * Java Regex API Embedded Flags Sample Codes Check matching string boolean b = Pattern.matches("a*b", "aaaaab"); Pattern p = Pattern.compile("a*b"); boolean b1 = p.matcher("aaaaab").matches(); boolean b2 = p.matcher("bbbbb").matches(); Find matching string final String name = RegExUtils.replaceFirst(output, "(?s).*((?:\\w|\\s)*).*", "$1"); final String country = RegExUtils.replaceFirst(output, "(?s).*((?:\\w|\\s)*).*", "$1"); final String age = RegExUtils.replaceFirst(output, "(?s).*((?:\\w|\\s)*).*", "$1"); Readings * The DOTALL mode in Java regular expression Perl * Perl Regular Expressions Quick Start * Perl Regular Expressions Tutorial * Perl Regular Expressions Reference * Perl Regular Expression Character Classes * Perl Regular Expression Backslash Sequences and Escapes .NET * Regular Expression Language - Quick Reference sed * Regular Expressions: selecting text * BRE syntax * ERE syntax Special Topics Special Characters ERE special characters An ERE special character has special properties in certain contexts. Outside those contexts, or when preceded by a backslash, such a character is an ERE that matches the special character itself. The extended regular expression special characters and the contexts in which they have their special meaning are: ;. \ [ ( :The period, left-bracket, backslash and left-parenthesis are special except when used in a bracket expression. Outside a bracket expression, a left-parenthesis immediately followed by a right-parenthesis produces undefined results. ;)''' :The right-parenthesis is special when matched with a preceding left-parenthesis, both outside a bracket expression. ;* + ? {' :The asterisk, plus-sign, question-mark and left-brace are special except when used in a bracket expression (see RE Bracket Expression ). Any of the following uses produce undefined results: :*if these characters appear first in an ERE, or immediately following a vertical-line, circumflex or left-parenthesis. :*if a left-brace is not part of a valid interval expression. ;'|''' :The vertical-line is special except when used in a bracket expression. A vertical-line appearing first or last in an ERE, or immediately following a vertical-line or a left-parenthesis, or immediately preceding a right-parenthesis, produces undefined results. ;^''' :The circumflex is special when used: :*as an anchor :*as the first character of a bracket expression ;$''' :The dollar sign is special when used as an anchor. BRE A BRE special character has special properties in certain contexts. Outside those contexts, or when preceded by a backslash, such a character will be a BRE that matches the special character itself. The BRE special characters and the contexts in which they have their special meaning are: ;. [ \ :The period, left-bracket and backslash is special except when used in a bracket expression (see RE Bracket Expression ). An expression containing a [ that is not preceded by a backslash and is not part of a bracket expression produces undefined results. ;*''' :The asterisk is special except when used: :*in a bracket expression :*as the first character of an entire BRE (after an initial ^, if any) :*as the first character of a subexpression (after an initial ^, if any); see BREs Matching Multiple Characters . ;^''' :The circumflex is special when used: :*as an anchor (see BRE Expression Anchoring ) :*as the first character of a bracket expression (see RE Bracket Expression ). ;$ :The dollar sign is special when used as an anchor. Formal rules for bracket expression Bracket expressions such as 0-9a-zA-Z, ^0-9a-zA-Z, or 0-9a-zA-Z.?*+- are kind of different from normal expressions. One of the most important differences is metacharacters or special characters. Including that, more formal detailed description for bracket expression can be found in the following * RE Bracket Expression in Single UNIX Specification by X/Open group * RE Bracket Expression in POSIX.1-2008 Capturing, Grouping and Backreferences * Backreferences in Regular Expressions * Replacement Strings Reference: Matched Text and Backreferences NOT operator in Regex * how to use NOT operator * Not Operator in Regular Expressions Nested pairs search * Can I use Perl regular expressions to match balanced text? * .NET Regular Expressions: Regex and Balanced Matching * Regex Recursion (Matching Nested Constructs) Lookaround : lookahead and lookbehind * Lookahead and Lookbehind Zero-Width Assertions * Regular expression to match text that *doesn't* contain a word? (Negating a word) Greedy, Reluctant, or Possessive Quantifiers * Quantifiers in The Java Tutorials